Exploiting Data-Independence for Fast Belief-Propagation
نویسندگان
چکیده
MAP-inference in graphical models requires that we maximize the sum of two terms: a datadependent term, encoding the conditional likelihood of a certain labeling given an observation, and a data-independent term, encoding some prior on labelings. Often, the data-dependent factors contain fewer latent variables than the data-independent factors. We note that MAPinference in any such graphical model can be made substantially faster by appropriately preprocessing its data-independent terms. Our main result is to show that message-passing in any such pairwise model has an expected-case exponent of only 1.5 on the number of states per node, leading to significantly faster algorithms than the standard quadratic time solution. ‘Data-Independence’ MAP-inference in a graphical model G consists of solving an optimization problem of the form ŷ = argmax y ∑ C∈C ΦC(yC), where C is the set of cliques in the model. Often, the model can be further factorized if we make a distinction between the latent variables y and the observation x: ŷ(x) = argmax y ∑ F∈F ΦF (yF |xF ) } {{ } data dependent + ∑ C∈C ΦC(yC) } {{ } data independent . We say that those cliques containing only latent variables are data-independent. In many models, those cliques that contain an observed variable contain fewer latent variables than the purely latent cliques, i.e., each F ∈ F is a proper subset of some C ∈ C. Examples of such models are shown at top-right. Example Models Examples of graphical models to which our results apply: cliques containing observations have fewer latent variables than purely latent cliques. In other words, cliques containing a grey node encode the data likelihood, whereas cliques containing only white nodes encode priors. We focus on cases where the gray nodes have degree one (i.e., they are connected to only one white node). In such cases we obtain an Ω( √ N) speedup on the number of states per node. Message-Passing In these models, message passing between two cliques A = (i, j), B = (j, k) takes the form mA→B(yi) = Ψi(yi)+max yj Ψj(yj)+Φi,j(yi, yj), (1) which is equivalent to matrix-vector multiplication in the max-sum semiring. In a recent paper [1], we showed that matrix-matrix multiplication in this semiring takes O(N) (for N × N matrices). In our current work, we note that a similar result can be applied to matrix-vector multiplication, so long as the matrix is known in advance. Since the ‘matrix’ in the above equation simply encodes a prior, it can be preprocessed offline. How it Works Step 1: 6 2 14 16 9 7 12 8 10 3 11 13 1 15 4 5 99 92 87 81 78 66 53 46 30 26 21 16 12 10 8 6 3 4 8 11 7 16 13 9 6 2 15 10 12 5 1 14 98 93 85 76 71 70 67 65 63 57 48 42 39 37 26 17 don't search past this line Step 2: 6 2 14 16 9 7 12 8 10 3 11 13 1 15 4 5 99 92 87 81 78 66 53 46 30 26 21 16 12 10 8 6 3 4 8 11 7 16 13 9 6 2 15 10 12 5 1 14 98 93 85 76 71 70 67 65 63 57 48 42 39 37 26 17 Step 3: 6 2 14 16 9 7 12 8 10 3 11 13 1 15 4 5 99 92 87 81 78 66 53 46 30 26 21 16 12 10 8 6 3 4 8 11 7 16 13 9 6 2 15 10 12 5 1 14 98 93 85 76 71 70 67 65 63 57 48 42 39 37 26 17 Step 4: 6 2 14 16 9 7 12 8 10 3 11 13 1 15 4 5 99 92 87 81 78 66 53 46 30 26 21 16 12 10 8 6 3 4 8 11 7 16 13 9 6 2 15 10 12 5 1 14 98 93 85 76 71 70 67 65 63 57 48 42 39 37 26 17 Step 5: 6 2 14 16 9 7 12 8 10 3 11 13 1 15 4 5 99 92 87 81 78 66 53 46 30 26 21 16 12 10 8 6 3 4 8 11 7 16 13 9 6 2 15 10 12 5 1 14 98 93 85 76 71 70 67 65 63 57 48 42 39 37 26 17 We wish to compute maxi va[i] + vb[i]. Arrows connect corresponding elements of va and vb, as sorted by pa and pb. We draw a red line connecting the leftmost arrowheads that have been seen so far. Any ‘arrows’ whose tail lies to the right of this line cannot possibly correspond to an optimal solution. Experiments 0 100 200 300 400 500 N (number of states) 0 10 20 30 40 50 N um be ro fa dd iti on s Number of online operations per message entry naı̈ve method our method 2 √ N 2×∑bN/2c m=0 (N−m)!(N−m)! (N−2m)!N ! 0 100 200 300 400 500 N (number of states) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 To ta lw al lt im e (s ec on ds ) Random potentials (2500 node chain) naı̈ve method 0.00002N (r = 0.00514) our method 0.00002N (r = 0.00891) 0 50
منابع مشابه
In-Network Nonparametric Loopy Belief Propagation on Sensor Networks for Ad-Hoc Localization
Sensor Networks provide a cheap, unobtrusive, and easy-to-deploy method for gathering large quantities of data from an environment. While this data is often noisy, we can compensate by exploiting spatial correlation. This paper proposes the use of the statistical inference method of Loopy Belief Propagation (LBP) to exploit this correlation structure in the context of a well-examined problem in...
متن کاملIndependence of Causal In uence and Clique Tree Propagation
This paper explores the role of independence of causal in uence (ICI) in Bayesian network inference. ICI allows one to factorize a conditional probability table into smaller pieces. We describe a method for exploiting the factorization in clique tree propagation (CTP) | the state-of-the-art exact inference algorithm for Bayesian networks. We also present empirical results showing that the resul...
متن کاملMapReduce Lifting for Belief Propagation
Judging by the increasing impact of machine learning on large-scale data analysis in the last decade, one can anticipate a substantial growth in diversity of the machine learning applications for “big data” over the next decade. This exciting new opportunity, however, also raises many challenges. One of them is scaling inference within and training of graphical models. Typical ways to address t...
متن کاملHigh-Dimensional Covariance Decomposition into Sparse Markov and Independence Domains
In this paper, we present a novel framework incorporating a combination of sparse models in different domains. We posit the observed data as generated from a linear combination of a sparse Gaussian Markov model (with a sparse precision matrix) and a sparse Gaussian independence model (with a sparse covariance matrix). We provide efficient methods for decomposition of the data into two domains, ...
متن کاملGeneralised Propagation for Fast Fourier Transforms with Partial or Missing Data
Discrete Fourier transforms and other related Fourier methods have been practically implementable due to the fast Fourier transform (FFT). However there are many situations where doing fast Fourier transforms without complete data would be desirable. In this paper it is recognised that formulating the FFT algorithm as a belief network allows suitable priors to be set for the Fourier coefficient...
متن کاملOnline Belief Propagation for Topic Modeling
Not only can online topic modeling algorithms extract topics from big data streams with constant memory requirements, but also can detect topic shifts as the data stream flows. Fast convergence speed is a desired property for batch learning topic models such as latent Dirichlet allocation (LDA), which can further facilitate developing fast online topic modeling algorithms for big data streams. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010